Possible fix for the GPU transpose utility #1996
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I identified a bug in the GPU transpose utility and could create an attempt to fix it. The bug is visible when using the GPU approximate nearest neighbors method
IVFPQ
and is reproducible with the following code (requirescuML
) :This code throws the following error:
Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::runTransposeAny(faiss::gpu::Tensor<OtherT, OtherDim, true, int, faiss::gpu::traits::DefaultPtrTraits>&, int, int, faiss::gpu::Tensor<OtherT, OtherDim, true, int, faiss::gpu::traits::DefaultPtrTraits>&, cudaStream_t) [with T = float; int Dim = 3; cudaStream_t = CUstream_st*] at <...>/faiss/faiss/gpu/utils/Transpose.cuh:207; details: CUDA error 9 invalid configuration argument Aborted (core dumped)
The problem appears when the number of samples in the index is above 65535. The issue seems to stem from the fact that in preparation of the launch of the
transposeOuter
CUDA kernel, the y dimension of the grid of thread blocks is parametrized with the number of rows/samples. Indeed, this causes a problem as the maximum y-, or z-dimension of a grid of thread blocks is 65535 (see CUDA compute compatibility technical specifications). In my understanding, this is what throws aCUDA error 9 invalid configuration argument
during the launch of the CUDA kernel that follows.Here is the part of the code that is concerned by the problem:
faiss/faiss/gpu/utils/Transpose.cuh
Lines 160 to 175 in 7c2d238
Tagging issues:
rapidsai/cuml#4020
#1771
#1835